FILTER MODE ACTIVE

#FP8 training

Records found: 3

#FP8 training30/10/2025

Ant Group Unveils Ling 2.0 — Scaling Sparse MoE Reasoning to 1T with 1/32 Activation

'Ling 2.0 is a reasoning-first sparse MoE family from Ant Group that keeps per-token compute low with a 1/32 activation recipe while scaling from 16B to 1T parameters.'

READ →

#FP8 training04/06/2025

DeepSeek-V3: Revolutionizing AI Efficiency with Hardware-Aware Design

DeepSeek-V3 introduces hardware-aware AI design innovations that dramatically improve efficiency and reduce resource requirements, enabling smaller teams to compete with tech giants.

READ →

#FP8 training17/05/2025

DeepSeek-V3: Revolutionizing Language Models with Efficiency and Scalability

DeepSeek-V3 introduces innovative architecture and hardware co-design strategies that drastically improve efficiency and scalability in large language models, making high-performance AI more accessible.

READ →